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The quest for a quantitative characterization of community and modular structure of complex 
networks produced a variety of methods and algorithms to classify different networks. However, 
it is not clear if such methods provide consistent, robust and meaningful results when considering 
hierarchies as a whole. Part of the problem is the lack of a similarity measure for the comparison 
of hierarchical community structures. In this work we give a contribution by introducing the hier¬ 
archical mutual information, which is a generalization of the traditional mutual information, and 
allows to compare hierarchical partitions and hierarchical community structures. The normalized 
version of the hierarchical mutual information should behave analogously to the traditional nor¬ 
malized mutual information. Here, the correct behavior of the hierarchical mutual information is 
corroborated on an extensive battery of numerical experiments. The experiments are performed on 
artificial hierarchies, and on the hierarchical community structure of artificial and empirical net¬ 
works. Furthermore, the experiments illustrate some of the practical applications of the hierarchical 
mutual information. Namely, the comparison of different community detection methods, and the 
study of the consistency, robustness and temporal evolution of the hierarchical modular structure 
of networks. 

PACS numbers: 89.75.He,89.75.-k,89.75.Fb 


I. INTRODUCTION 

Many complex systems exhibit some degree of orga¬ 
nization at different physical scales. Often, the organi¬ 
zation is hierarchical. There exist examples of this fact 
in variegated fields, like biological, social and technolog¬ 
ical systems. Among the former, and starting from com¬ 
plex molecules (such as lipids, proteins, RNA or DNA) 
while increasing the scale of observation, new levels of 
organization are found: organelles, cells, tissues, organs, 
anatomical systems, organisms, populations and ecosys¬ 
tems. In the social context, human societies organize 
from the level of individuals, groups, cities, up to the 
global scale of countries or continents. Finally, among 
technological systems, computer networks are also ar¬ 
ranged at different scales from the local network level up 
to the domain level routing systems that constitute the 
backbone of internet. Hierarchical organizations seem 
ubiquitous in complex systems and, despite the early in¬ 
terest of the scientific community about the subject IS¬ 
IS], it is far from being fully understood. The description 
of hierarchical organization of complex systems remains, 
to a great extent, at the semantic level. This is mainly 
because the following difficulties: the existence of several 
relevant physical scales, the existence of a variety of orga¬ 
nizing principles, the large number of components, and 
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the lack of a generally enough and well defined formal 
theory for the identification of hierarchies. 

The study of complex networks [IHZI plays a central 
role in the characterization of the organization of com¬ 
plex systems. In essence, networks are used to represent 
the structure of the interactions between the components 
of the system under consideration. Therefore, it is rea¬ 
sonable to assume that some complex networks have hier¬ 
archically organized topologies, reflecting the underlying 
hierarchical organization of the associated complex sys¬ 
tems. A natural way of thinking about hierarchical net¬ 
work topologies is that of hierarchical community struc¬ 
tures; i.e. communities within communities of nodes i- 
unj. Typically, the identification of the communities of 
a network is computationally intensive and a statisti¬ 
cally difficult problem m- Although a large number of 
community detection methods have been developed al¬ 
ready HMZj - including methods for the identification 
of hierarchical community structures [510 nMg - not 
all methods provide comparable results. This is true, 
specially for hierarchical community structures. There¬ 
fore, similarity measures for the comparison of hierarchi¬ 
cal community structures are of crucial importance. The 
aim of this paper is to introduce an information-theoretic 
tool which can be used to compare hierarchies, or trees, 
which might be composed of network communities. We 
further show that this tool can be employed to trace the 
evolution of hierarchies when temporal networks are an¬ 
alyzed. 

A standard way to quantify the similarity of two com¬ 
munity structures is to compute the mutual information 
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between the associated node partitions EillTl. Extend¬ 
ing the idea, the present paper introduces a hierarchical 
mutual information, generalizing the traditional mutual 
information to work with hierarchical partitions. In prin¬ 
ciple, there might be different ways in which the mutual 
information can be generalized into a hierarchical mu¬ 
tual information. In this work, hierarchies are consid¬ 
ered to be of divisive nature; i.e. the whole is divided 
into parts, each of which is sub-divided into sub-parts, 
an so on, following a top-down approach. As a conse¬ 
quence, in this context hierarchies are represented by 
trees with branches of varying length. Other possible 
generalization approaches might exist. For example, gen¬ 
eralizations that consider agglomerative hierarchies - i.e. 
bottom up approaches - or overlapping communities. Al¬ 
ternatively, related methods exists for the comparison of 
phylogenetic trees [281130] . Recently, a method to com¬ 
pare hierarchies was introduced; the method follows a 
combinatorial approach |3T]. However, to the best of our 
knowledge, no previous method based on information- 
theoretic measures, exists for the comparison of hierar¬ 
chies. These alternative methods, and the previously 
mentioned generalization approaches, are not discussed 
further in this paper, but can be considered in future 
works. 

The outline of the paper is the following. In sec¬ 
tion |ll] the hierarchical mutual information is motivated 
and introduced. In section m this measure is tested 
on different synthetic setups. More specifically, in sub¬ 
section III A[ the behavior of the hierarchical mutual in¬ 
formation is tested in artificial hierarchies; while in sub¬ 
section im the hierarchical mutual information is used 
to analyze the hierarchical community structure of artifi¬ 
cial networks, or network models. A similar procedure is 
performed on empirical networks in section fill C includ¬ 
ing the case of a temporal one. Finally, the discussion 
and conclusions are summarized in section CYl 


II. THEORY 


A. Hierarchical Partitions 

A hierarchical partition is a generalization of the tradi¬ 
tional concept of partition. Here, each element of the par¬ 
tition can be recursively partitioned into others, yielding 
a hierarchy. The formal definition is as follows. Consider 
a set of elements, or universe, denoted by Q. An element 
in H is denoted by i. The set H splits into a hierarchy of 
sub-sets, denoted by v. The number of elements in the 
sub-set V is written as |u|. The hierarchical partition, or 
simply hierarchy, is represented by a tree denoted by T. 
The root uq G T is the “oldest ancestor” of the various 
vertices, or descendants in the tree T. As a sub-set, the 
root contains the whole set of elements, i.e. va = H. For 
any sub-set v gT, AJ denotes the set of direct descen¬ 
dants of u. A sub-set v is at the Z-th level (or depth) of 
the hierarchy if I is the topological distance from v to vq. 


When there is no confusion, we simplify the notation to 
Ay, i.e. by omitting the reference to T. 

Consider a network of nodes i and links (weighted 
or not) Wii>. Here, the terms elements and nodes are 
used interchangeably; both, referring to the entities de¬ 
noted by i. Traditionally, the community structure of 
a network is represented by a node partition. In many 
cases, these communities present a hierarchical organi¬ 
zation. In particular, if the hierarchy is constituted by 
sub-communities within communities, then the structure 
can be mapped to a hierarchical partition T. Depending 
on the context, T is referred to as a tree, as a hierarchical 
community structure, or simply as a hierarchy; i.e. the 
terms are used interchangeably. Each sub-set v G T cor¬ 
responds to one and only one sub-community of the net¬ 
work hierarchical community structure (see Fig. The 
root VQ represents the set of all nodes in the network. 
The children u G Ay correspond to a partition of the 
sub-community v into sub-communities u. The leaves 
of T are the smallest sub-communities of the network. 
Finally, each sub-community v G T has an associated 
sub-network with links , between the pair of nodes 
i, i! G V. 


B. Uncertainty Reduction 

In this section, the definition of the hierarchical mu¬ 
tual information is motivated. Only Shannon-based in¬ 
formation measures are used throughout the rest of the 
paper [5^ . 

Consider how the uncertainty about the identification 
of a specific node i is reduced when going down a tree T. 
As the root vq gT represents the set of all nodes, to look 
for a specific node i requires checking = log 2 Itinl binary 
choices. In other words, the uncertainty is reduced by 
ln|t)n| nats when a node i is unequivocally identified (a 
nats is a unit of information equals to 1/ In 2 « 1.44 bits), 
and there is no uncertainty left. Sometimes the informa¬ 
tion pointing towards a specific node is not precise, and 
the uncertainty reduction is not complete. For example, 
if node i is specified to be in the sub-community v, the 
uncertainty reduction is Injrinl — ln|u| = — ln(|n|/|un|) 
nats, and In |u| nats of uncertainty still remains. 

Traversing a hierarchy along descendants is similar to 
a sequential reduction of uncertainty. More specifically, 
it is possible to write 

- lnl/|z)o| = -ln|ni|/|nn| - ln|?; 2 |/|fi| - 

... - ln|n;|/|z;i_i| - In |u/+i|/|n;|... 

.. - In \vLi\/\vL,-i\ - Inl/lviJ, (1) 

where Li is the deepest level at which node i can be 
found. Each term — In |z;i|/|u/_i| can be considered ed 
a conditional uncertainty reduction. Specifically, how 
much the uncertainty is reduced when new information 
is gained (that i G vi), given that some other information 
was already available (that i G vi-i). 
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It is possible to average over nodes i using an appropri¬ 
ate weighted version of the expression in Eq. Q. More 
specifically, the average uncertainty reduction along the 
tree T is defined as 


{Hr) 


^ -^lnM + ... 

...+ y -^In^ 
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\vl \ \vl\' 


( 2 ) 


where, for simplicity, we wrote the equation for the par¬ 
ticular case in which all branches of the tree T have the 
same length, i.e. Li = L for all j S fl. The general 
case is introduced later in sub-section |IIC[ In Eq. ev¬ 
ery reduction step is weighted by the fraction of nodes 
that are found by following the corresponding branch of 
the tree T. Using similar ideas, the hierarchical mutual 
information is defined in the next section. 


C. The Hierarchical Mutual Information 

In community detection problems, it is customary to 
quantify the similarity between two inferred community 
structures using the mutual information between the cor¬ 
responding node partitions [IllllllEe]. Here, the goal is 
to introduce the hierarchical mutual information to quan¬ 
tify the similarity between two hierarchical partitions, or 
trees, associated to corresponding hierarchical commu¬ 
nity structures. 

Consider two trees T and T' and two sub-communities 
V £ T and v' £ T', both at the same topological dis¬ 
tance, or level I, from the roots of their corresponding 
trees. It is not necessary for the trees T and T', nor 
the sub-communities v and v' to contain the same ele¬ 
ments. Let Tv represent the sub-tree of root v obtained 
from T. The analogous holds for y. The hierarchical 
mutual information between the sub-trees Tv and y is 
denoted by I{Tv;Tv>)- By definition, it is assumed that 
I{Tv; y) =0 if either v or v' is a leaf of the correspond¬ 
ing tree. Otherwise, I{Tv',Tvi) is recursively defined by 
the formula 


I{Tv-X') :=/(A„;A„>nU) 


' E 
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In Eq. the first term of the r.h.s. is called the one step 
mutual information, and is defined as 


I{Av; Ayi In n U) := 7L(A„|n n U) -I- H^Ayi |n 0 n') 

-7L(A„ n A„/|n n n'), (4) 


where H{-) represents the Shannon entropy. These terms 
are computed as 
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otherwise 
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if |n n n'l 7 ^ 0 


otherwise 


In all cases, the convention OlnO = 0 is adopted. Einally, 
the hierarchical mutual information of two full trees T 
and T' is denoted and defined by 


I{TX)--=I{XX0 ( 7 ) 


where no and Vq are the roots of T and T', respectively. 

Each term involved in I(T]T') is non-negative, and 
thus, the hierarchical mutual information is a non¬ 
negative quantity. Also, I{T\T') = I{T';T), i.e. it is 
a symmetric function of its arguments. When the trees 
T and T' are just stars, i.e. a root plus one generation of 
descendants, it is possible to think of them as standard 
partitions. In this case, the hierarchical mutual informa¬ 
tion reduces to the standard mutual information. 

Note, the hierarchical mutual information is not a mea¬ 
sure of the similarity between the corresponding hnal 
partitions of the nodes at the leaves of the trees (except 
when both trees are stars). Rather, it is a summation 
of weighted local one-step contributions, measuring how 
similar the partitions are at each corresponding point in 
both trees. For example, if two nodes i and i' are sep¬ 
arated at level I in tree T and at level I' I in tree T' 
then, the separation of i and i' contributes with zero to 
the value of the hierarchical mutual information. 

For practical purposes, a normalized hierarchical mu¬ 
tual information is defined as 


^irx) 


ijrx) 

XWnWrr)' 


( 8 ) 


We would like to notice the reader that there exists more 
than one way to normalize the mutual information. Here, 
we work with one that takes inspiration from the Cauchy- 
Schwarz inequality but, future experiments may prove 
other normalization methods to be more convenient de¬ 
pending on the particular context in which the hierarchi¬ 
cal mutual information is used. The value of i{T ; T') lays 
in the interval [0,1] and attains the maximum 1 if and 
only if T = T', as indicated by the results of extensive 
numerical exploration reported in the following sections. 
However, formal proofs of the previous statements, and 
the following ones, are still missing. More specihcally, 
it remains to be proved that: i) I{T;T') < I{T\T) for 
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all T and T' and, ii) the equality holds if and only if 
T = T^ These statements imply the previous ones, and 
constitute desirable properties for a well defined measure 
of mutual information to have. 

To help better understand the hierarchical mutual 
information, a simple example is worked out explic¬ 
itly. Consider the set of nodes {a, &, c, d, e,/}, and the 
two hierarchical partitions T = {{{a}, {6, c}}, {d, e,/}} 
and y}} (see Fi§s. and &)■ 

Here, un = = {a, 6, c, d, e,/}. Also, = 

{{a,&,c},{d,e,/}} and = {{a}, {5, c}, {d, e,/}}. In 
the tree 7~, there is an intermediate sub-community 
{a, 6, c} which is not on the other tree T'■ As a con¬ 
sequence, the one-step mutual information at level I = 1 
is /(A„„; |{o, 6, c, d, e,/}) = 0.693 (see Eq. [^. All 

other terms corresponding to levels I > 1 contribute with 
zero because they involve leaves. This is because the 
tree T' is just a star which has only one level. Adding 
all together, I{T]T') = 0.693. On the other hand, the 
self-hierarchical mutual informations are I (T; T) = 1.011 
and = 1.011. Therefore, the normalized hier¬ 

archical mutual information yields i{T',T') = 0.685; a 
value smaller than one. In other words, these trees share 
only a fraction of the information they contain. This 
holds in spite that the partitions at the bottom are the 
same for both trees. 

To facilitate future research, collaboration and scien- 
tihc reproducibility, we provide Python [321 code imple¬ 
menting the hierarchical partition data-structure and the 
hierarchical mutual information function, as an open- 
source package [M]. The example of Figs. and 
is provided in the Python package. 



FIG. 1. (Color online), a) Illustration of how a hierar¬ 
chy of communities obtained from a Sierpinski network Wai 
corresponds to a hierarchical partition, or tree T. The root 
vn & T contains all the nodes of the network wn' , and v rep¬ 
resents a sub-community level Z = 1. In b) and c), two sim¬ 
ple hierarchical partitions, or trees, of the same set of nodes, 
{a, b, c, d, e, /, g}, are presented. On the tree T, the node a is 
separated from the other nodes {b, c} at the level I = 2, while 
on the tree T' the separation occurs at level 1 = 1. This dif¬ 
ference implies a normalized hierarchical mutual information 
smaller than one, even if the partition at the bottom of both 
trees is the same. 


III. RESULTS 

A. Testing the Hierarchical Mutual Information in 
Artificial Hierarchies 

Before focusing on the hierarchical community struc¬ 
tures of networks, we analyze the behavior of the hier¬ 
archical mutual information when used to compare ar¬ 
tificially generated hierarchical partitions. More specif¬ 
ically, hierarchies composed of binary trees T contain¬ 
ing N = 2^ elements i, L levels, and 2^+^ — 1 sub¬ 
communities including the root. Each tree has one el¬ 
ement i per sub-community at the bottom level I = L, 
two elements per sub-community at the previous level 
I = L — 1, and so on until it has N elements at the root. 

In the experiments, the original trees are compared 
against correspondingly randomized ones. The idea is to 
show how the normalized hierarchical mutual informa¬ 
tion decays with respect to the level of randomization. 
Two different randomization procedures are used. 

In the first randomization procedure, pairs of elements 
are randomly chosen from the tree, and consecutively 
swapped until a fraction / of them is affected. This is 
called the basic randomization procedure. In Fig. the 


average normalized hierarchical mutual information {i^) 
is plotted vs the fraction / of randomized elements. The 
average is computed over 100 repetitions of the random¬ 
ization procedure, for each value of / and L. Notice, 
(il) decays approximately in an exponential way with 
respect to /; further, it is almost independent of L ex¬ 
cept for large values of /, where finite size effects become 
important. In particular, when the hierarchy is fully ran¬ 
domized, i.e. / = 1, the (Zl) is non-zero. Although a 
priori this may be attributed to an error, it is indeed an 
expected result for finite size hierarchies: random coinci¬ 
dences produce a non-zero amount of shared information. 
A similar result is known to hold for the traditional mu¬ 
tual information m- 

In the second procedure, the elements are also shuf¬ 
fled by swapping pairs chosen at random. However, a 
given pair is swapped only if both elements belong to 
the same sub-community at depth 1. In other words, the 
randomization procedure preserves the classification of 
the elements at the levels 0,I,...,Z — 1, while in the sub¬ 
sequent levels I, I -b 1,...,L, the original classification is 
destroyed. Again, the swapping procedure runs until a 
fraction / of the elements is affected. This second pro¬ 
cedure is called the level-preserving randomization pro¬ 
cedure. In Fig. the average normalized hierarchical 
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FIG. 2. (Color Online). The normalized hierarchical mu¬ 
tual information, (iz,), comparing hierarchical partitions rep¬ 
resented by binary trees with L levels, and corresponding ran¬ 
domized partitions with a fraction / of the elements shuffled 
at random. The average is computed over 100 realizations of 
the shuffling procedure, and the different curves correspond to 
trees with different number of levels L. The black dashed line 
corresponds to an exponential fit, (fz) = exp(—///o) with 
/o = 0.490 ± 0.004 and = 0.968, for the case L — 10. 
Error bars and standard-deviation bars are not plotted for 
clarity. 

mutual information (i;) is plotted as a function of / for 
the level-preserving randomization procedure. Here, ex¬ 
periments are repeated for different values of I and fixed 
L = 7. Averages are computed as it was done with the 
basic randomization procedure. In line with the previ¬ 
ous result of Fig. (ii) also decreases with / following 
approximately an exponential decay. Now the greater is 
the shuffling level I, the slower is the decay. In particu¬ 
lar, for / = 6 no decay at all is observed, i.e. (ii) = 1 for 
all /. This is expected because trees have L = 7 levels 
and only one element per sub-community at the bottom 
level, which do not contribute to the hierarchical mutual 
information. 


B. Comparing Community Detection Algorithms 
on Artificial Networks 

1. Community Detection Methods 

One of the interesting applications of the normalized 
hierarchical mutual information is comparing the results 
yielded by different community detection methods. In 
this paper, three community detection methods are com¬ 
pared: Infomap [5], which find a hierarchy of communi¬ 
ties through the minimization of the description length of 
the path traversed by a random walker; the Hierarchical 
Stochastic Block Model method (HSBM) [5], which fits a 
hierarchy of stochastic block models to the network topol¬ 
ogy; a Recursive Louvain method (RL), which recursively 
splits the network into a hierarchy of network modules 
using, at each step, the well-known Louvain community 



FIG. 3. (Golor Online). The normalized hierarchical mutual 
information comparing hierarchical partitions represented by 
binary trees with L = 7 levels, and corresponding randomized 
partitions where a fraction / of the elements are randomly 
shuffled. The randomization procedure preserves the element 
classification of levels 0,1,..., I — 1, but affects the rest of the 
levels (different curves with symbols). The dashed line is the 
best fit of an exponential decay (ii) = exp(—///o), with /o = 
0.478 ± 0.008 and = 0.978, for I — 0. For clarity, error 
bars and standard-deviation bars are not shown. 

detection algorithm [3S]. In what follows, the relevant 
aspects of the different methods are considered in more 
detail. 

Infomap return hierarchies that are consistent with a 
divisive algorithm, i.e. the branches of the corresponding 
trees may have different depths. The algorithm itself uses 
both approaches, repeatedly. Communities are split and 
merged until a minimum description length is attained. 
In the hierarchies obtained by this method, the leaves 
have one and only one node. For the sake of comparison 
with the other methods, these communities of size equal 
to one are ignored, except if same level communities of 
size larger than one exists. 

At difference with Infomap, the HSBM merge nodes 
to generate super-nodes or communities, which are fur¬ 
ther merged to obtain the communities at the con¬ 
tiguous higher level, and so on. As a consequence, 
all the branches of the returned trees have the same 
depth. Moreover, the HSBM may return trees containing 
sub-communities with descendants but no further sub¬ 
divisions, i.e., sub-communities with only one child. Al¬ 
though the hierarchies produced by the HSBM can be 
compared using the hierarchical mutual information - as 
they are hierarchical partitions - the comparison are not 
fully appropriate. This is because the hierarchical mu¬ 
tual information is based on a divisive approach, while 
the HSBM is based on an agglomerative approach. The 
experiments involving the HSBM show how important is 
the difference between both kind of approaches. 

The recursive application of the Louvain method is a 
mixed agglomerative-divisive algorithm. The standard 
Louvain method is an agglomerative algorithm; a com¬ 
munity structure is obtained by merging modules un- 
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til the modularity [36j of the partition, denoted by Q, 
reaches a maximum value [35]. On the other hand, the 
recursive use of Louvain presented here, is a divisive 
method. More specifically, given a network wu' (defining 
the level I — 0), a standard Louvain method is applied to 
obtain a partition into sub-communities v at level 1 = 1. 
Then, Louvain is applied again on each sub-community, 
to split each sub-network wl^, into sub-communities at 
level 1 = 2, and so on. In this way, a tree T is gener¬ 
ated. The division of a particular sub-community stops 
when the standard Louvain returns a modularity Q < 0. 
Importantly, the Louvain method is not deterministic, 
leading to stochastic differences from run to run. Two 
important points have to be stressed: First, the use of 
Louvain is circumstantial, any other modularity maxi¬ 
mization procedure would produce similar results. Sec¬ 
ond, the idea of a recursive application of a modularity 
based community detection algorithm is not new, and 
more elaborate algorithms do exist nni Ezi [3Zj- How¬ 
ever, here RL is chosen for its simplicity. Our main goals 
are: to show how the hierarchical mutual information 
behaves, and to illustrate how it can be used, without 
aiming to find the best community detection method. 


2. Artificial Hierarchical Networks 

In order to analyze the performance of the different 
community detection methods, in this section they are 
run on specific networks. Here, two well-known bench¬ 
mark network models are used to generate the networks 
necessary for the experiments. In principle, these net¬ 
work models are able to generate network samples with 
underlying hierarchical community structures. Clearly, 
the specific characteristics of the generated networks de¬ 
pend on the parameter values chosen. 

The first network model is the hierarchical planted par¬ 
tition model (HPM) |2T|, a generalization of the planted 
partition model |38| where the network obtained is hi¬ 
erarchically arranged. In this model, N nodes are con¬ 
nected according to a hierarchical structure of L levels 
and a branching factor B. For practical purposes, we 
chose N = 512 nodes, L = 3 levels and a branching fac¬ 
tor 5 = 4 (see Fig. |^. At the root level, I = 0, all nodes 
belong to the same community. At level ^ = 1, there 
exist 5 = 4 communities with 128 nodes each. Con¬ 
secutively, at the final level I = 2, there are 5^ = 16 
communities, with 32 nodes each. Each node has an 
average of Ki links to nodes exclusively within the com¬ 
munity they belong at level I, i.e. K 2 , Ki, Kq to other 
nodes in the same communities at levels 2, 1, 0, respec¬ 
tively. Therefore, the total average degree of the nodes 
is (fc) = Kq + Ki -I- K 2 . In principle, networks sampled 
from the HPM have the expected hierarchical community 
structure whenever Kq < Ki < K 2 m- 

The second network model consists of Sierpinski net¬ 
works with L levels. Fig. & illustrates a Sierpinski net¬ 
work with 5 = 3. These networks have a natural self- 



FIG. 4. (Color Online). A sample network obtained from 
the hierarchical planted network model (HPM), for Kq = 
0.25, Ai = 2 and K 2 = 8. The network contains N = 512 
nodes, 4 big communities of 128 nodes each, and 16 small 
communities of 32 nodes. The darkest links connect pairs of 
nodes sharing the same small community at level I = 2, the 
links of intermediate brightness connect nodes sharing inter¬ 
mediate communities at level 1 = 1, but not sharing the same 
small communities. Finally, the brightest links connect pairs 
of nodes at level I = 0, but not nodes sharing communities at 
levels I = 1 and I = 2. 


similar and hierarchical modular structure. A Sierpinski 
network with a single level is just a clique with 3 nodes, 
i.e. a triangle. A network of this type with 5-1-1 lev¬ 
els is obtained by by replacing each node of a Sierpinski 
network with 5 levels by a clique of size 3. It is worth 
to point out that a Sierpinski network with 5 levels has 
N{L) = 3^ - nodes, M(5) = 3[M(5 - 1) -k 1] - 
links, and its average degree (k) —>■ 3 when 5 —>■ 00 . To 
make the analysis more interesting, a fraction / of the 
links in the Sierpinski networks are randomly rewired. 
The rewiring procedure is well-known |39j . Essentially, 
successive pairs of links, each of which is chosen at ran¬ 
dom, swap the nodes at their extremes until a fraction 
/ of the links is affected. In this way, there is a well- 
defined hierarchy of communities for / = 0, which is 
progressively blurred out as / increases. 

In the following sections, the different community de¬ 
tection methods, and these two network models are com¬ 
bined into a set of experiments analyzed using the nor¬ 
malized hierarchical mutual information. 


3. Hierarchical-fidelity 

Each network model has an associated natural or refer¬ 
ence hierarchy, denoted as T*. Specifically, the reference 
hierarchy for the Sierpinski and HPM models are shown 
in Figs. and[^ respectively. The hierarchies identified 
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by the community detection methods are not necessar¬ 
ily equal to the reference ones, and in some cases, they 
don’t even resemble it. The degree of fidelity of the com¬ 
munity detection method measures how similar are the 
identified communities to the reference ones. Formally, 
given a community detection method, a network sam¬ 
ple Wii' and a reference hierarchy 7”*, the hierarchical- 
fidelity - or simply, fidelity - of the community method 
is defined as the average normalized hierarchical mutual 
information, {i(T*;T)). The average is computed sum¬ 
ming over an ensemble of hierarchies {T}, obtained by 
repeatedly identifying the hierarchical community struc¬ 
ture of the network ww, using the chosen community 
detection method. In the results shown, the ensemble 
{T} was composed by 100 hierarchies. Furthermore, the 
fidelity is averaged by sampling 100 networks from each 
network model. The procedure is repeated for different 
values of the network models parameters, and using the 
different community detection methods. 

For the case of the HPM, two different model re- 
parameterizations are used. In one case the whole net¬ 
work structure change simultaneously, while in the other 
case, only one level is affected m- More specifically, in 
case 1 all parameters Kq = 7.75fi 2, and Ki = 6/r -|- 2 

are linearly re-parameterized by /r € [0)l]j while K 2 is 
kept constant. In case 2, the parameters Kq = K 2 = 8 
are kept constant, while Ki = 8/i -I- 4 changes linearly 
with yL. For the case of the Sierpinski network model, 
the parameter is the fraction of randomized links, /, as 
mentioned in Section IIII B 21 In what follows the results 
are presented and commented. 

First, the results of the fidelity for Infomap are shown 
in Fig.[^. In the HPM, case I, Infomap detects the refer¬ 
ence hierarchy almost always for fi = 0, and the fidelity is 
« 1. On the other extreme, at /r = 1, Infomap typically 
finds a one-level hierarchy composed of 4 communities 
with 128 nodes. The 4 communities are the right ones at 
the level I = 1, and the fidelity decays to « l/-\/2 = 0.707. 
The decay in the fidelity is expected because all Ki con¬ 
verge to the same value Ki = 8 when /r —> 1, making 
the generated network hierarchies less defined. In case 
2, the same scenario occurs for y > 1/2, i.e. the same 4 
communities are identified. On the other hand, for small 
/i, the structure of the network is dominated by links at 
levels 0 and 2. As a consequence, and depending on the 
particular network realization, Infomap finds a one-level 
hierarchy with either 1 or « 16 communities, resulting 
in a small fidelity value. For the Sierpinski networks, the 
behavior can be more easily interpreted. For small /, 
Infomap finds an approximately accurate representation 
of the exact hierarchy of communities. However, as / 
grows, the hierarchy is quickly blurred out and the fi¬ 
delity decays accordingly. 

The findings of the fidelity for the HSBM method are 
shown in Fig. Ep- For the HPM, the fidelity is almost a 
constant function of /i, for both cases 1 and 2. A closer in¬ 
spection reveals that, typically, the HSBM method splits 
the network samples into two communities at level 1 = 1, 


which are then further subdivided, giving rise to a hierar¬ 
chy with 3 levels. Interestingly, the identified hierarchies 
are similar regardless of the value of Ki. Therefore, the 
resulting fidelity is relatively small because the identi¬ 
fied hierarchies are significantly different for the refer¬ 
ence one. In essence, the two communities identified at 
level I = 1 mean a significant difference with respect to 
the expected value of 4. For the case of the Sierpinski 
model, the HSBM typically detects only one community, 
except for vanishing / and L = 5 where the network splits 
into two big ones. For this second model, the fidelity is 
also generally small. In our view, this occurs because of 
two characteristics of the HSBM. On the one hand, the 
HSBM follows a conservative approach; no divisions are 
introduced until there is enough statistical evidence to 
justify them in terms of a hierarchy of stochastic block 
models. On the other hand, the HSBM follows a bottom- 
up approach - the elements in O are iteratively merged 
into modules, super-modules and so on, generating a tree 
T with all branches of the same topological length - while 
the hierarchical mutual information is more appropriate 
to compare top-down hierarchies (see|T]). In this sense, 
the comparison of the other methods with the HSBM by 
means of the hierarchical mutual information, highlights 
the crucial difference between top-down and bottom-up 
approaches. 

Thirdly, the fidelity is computed for the recursive Lou¬ 
vain method and the corresponding results are shown in 
Fig. [^. For the HPM, the fidelity is not a monotonic 
function of fi, instead it displays a maximum at an in¬ 
termediate value of fi. In general, this method tends 
to find the right communities at the first level I — 1. 
However, the random fluctuations of the network sam¬ 
ples become meaningful information for RL, and there¬ 
fore it tends to split the networks into more communities 
than the originally found in the reference hierarchy. As 
a consequence, the normalized hierarchical mutual infor¬ 
mation yields values smaller than one. However, because 
the information shared at level / = I is non-trivial and 
fairly accurate, the normalized hierarchical mutual infor¬ 
mation is far from being negligible. On the other hand, 
for the case of the Sierpinski network model, RL has a 
poor performance. In essence, this method finds signif¬ 
icantly more communities than expected, even at level 
1 = 1, resulting in small fidelity values for all /. 


4- Hierarchical-consistency 

In the previous section, it was shown that each commu¬ 
nity detection method returns hierarchies different from 
the expected ones; therefore, some questions arise. How 
mutually consistent are the returned hierarchies? Do 
these hierarchies represent noise, or represent a specific 
detected bias? The following set of experiments addresses 
these questions. More specifically, the idea is to analyze 
how mutually similar, or consistent are the communi¬ 
ties detected by the methods. Formally, the hierarchical- 
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b) HSBM 


c) RL 
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FIG. 5. (Color Online). The fidelity compares the hierarchical community structure of the networks generated with the models 
against corresponding reference hierarchies. The topology of the generated networks changes as a function of the parameter 
fj, € [0,1]. For the HPM, case 1, jj, parameterizes the network model according to Ko{jJ,) = 7.75/r + 0.25, = 6/r + 2 and 

K 2 = 8. Similarly, in case 2, Kq = K 2 = 8 and = 8/i + 4. For the Sierpinski model, / £ [0,1] is the fraction of rewired 

links and L is the number of network levels. Each panel corresponds to one of the community detection methods discussed in 
the text: a) Infomap, b) HSBM, and c) RL. In all cases, the bars represent standard-deviations around the mean. 


consistency - or just, consistency - of a method is defined 
as the average normalized hierarchical mutual informa¬ 
tion {i{T;T')), where the average is computed over an 
ensemble of pairs of hierarchies, {(T, T')}- The hierar¬ 
chies in the pairs are randomly chosen, without repeti¬ 
tion, from the ensembles of hierarchies generated in the 
previous experiments about the fidelity. The procedure 
is repeated for each network sample in order to average 
the consistency. The whole procedure is repeated for the 
different network models and corresponding parameters. 


In Fig. [^, the consistency is analyzed when the hier¬ 
archical communities are detected by using Infomap. For 
the HPM, case 1 (see Section IIIB2), the consistency is 
« 1 for all values of /i. In other words, in this initial 
setting, Infomap provides very consistent results always. 
For case 2, the fidelity is also close to 1 when fi is large; 
however, the consistency becomes small for small /i. This 
is expected, as it was already mentioned Infomap’s detec¬ 
tion is largely bimodal: either it finds one or « 16 com¬ 
munities depending on the network sample, and these 
two cases are very inconsistent with each other. For the 
Sierpinski networks, the consistency is large when / « 0 
and decays to a non-zero value for larger values of /. In 
other words, network randomization becomes important 
for large /, but still, part of the information captured by 
Infomap is already contained even in this case. 


The results of the consistency for the HSBM are shown 
in Fig. For the HPM, the observed consistency is 
large in both cases, 1 and 2, despite the small fidelity with 
respect to the natural hierarchies shown in Fig. [^. This 
means that the HSBM return hierarchies similar to each 
other, but significantly different from the reference one. 
More specifically, the returned hierarchies share similar¬ 
ities at level I — 1, but at the following levels the dif¬ 
ferences become important - except for case 1 at /r = 0 
where the consistency remains « 1. For the Sierpinski 
network model, the consistency is negligible in most of 
the range of /. This is expected because a flat hierarchy 
conveys no information, and the HSBM typically returns 


trivial hierarchies for the Sierpinski networks, i.e. hierar¬ 
chies with only one community, the root one. Only for 
small values of /, for the case L = 5, the consistency 
is non-zero, but still with small values. Here, only two 
communities are identified, agreeing only over a small 
fraction of the nodes. 

The consistency for the RL method is shown in Fig. [^. 
For the HPM, the curves look similar to the ones cor¬ 
responding to the fidelities in Fig. [§:. In essence, the 
computed hierarchies are very similar to each other, and 
to the reference hierarchy. For the Sierpinski network 
model, the consistency can be large, even if the fidelity 
is small. This means that the detected structure is in¬ 
variably the same, although different from the reference 
one. 


5. Hierarchical-similarity 

It is clear that the different community detection meth¬ 
ods return different results. However, it remains to ana¬ 
lyze how similar are the results of one detection method 
with respect to one another. To address this point, the 
hierarchical-similarity between two community detection 
methods is defined as the average normalized hierarchical 
mutual information, (i(7i;72)). In shorthand, we speak 
about the similarity^ and he average is computed over 
pairs of trees, where the trees 71 are computed with one 
of the methods, while the trees 71 with the other method. 
Both set of trees are computed from the same network 
sample. Later, the similarity is averaged by sampling 
networks from the different network models. The pro¬ 
cedure is repeated for each set of chosen values of the 
corresponding model parameters. In practice, the net¬ 
work samples and corresponding sampled trees used to 
compute the fidelities are the ones used to compute the 
similarities (see Figs. and [^. 

Combining the methods of Infomap, HSBM and 
RL, three different comparisons are possible: Infomap 
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a) Infomap b) HSBM c) RL 
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FIG. 6. The consistency measures how similar are the different hierarchies obtained from a given community detection method 
with respect to each other in a specific network sample. The computations are repeated for several network samples, for each 
network model, and different values of the parameters p and /. See Fig. for specific details of the simulation parameters. 
The panels, (a), (b) and (c) correspond, respectively, to the different community detection methods: Infomap, HSBM and RL. 
In all cases, the bars represent standard-deviations around the mean. 


vs. HSBM, Infomap vs. RL, and HSBM vs. RL. These 
are presented in Figs. [7fi,[33 and [Tj:, respectively. The 
HSBM method shares a small similarity with the other 
two. This is expected, because the other methods lead 
to relatively large fidelities, while the HSBM does not. 

The similarity between Infomap and the RL method is 
the largest among the three possibilities. However, the 
similarity cannot be as large as the consistency. This is 
not surprising as Infomap is able to return consistencies 
as large as 1, while RL is not. The largest similarity value 
is « 0.6, occurring at /i = 0 for the case 1 in the HPM. 
Also, the similarity is « 0.5 at /r = 1 for both cases, 1 
and 2. For the Sierpinski network, the similarity reaches 
a maximum value « 0.5 for small /, and it decays slowly 
up to « 0.2 for large /. 


C. Analysis of the Hierarchical Modular Structure 
of Complex Networks 

The experiments of the previous section can be re¬ 
peated using empirical networks - as opposed to net¬ 
work models - except for the computation of fidelity 
because, a priori, it is not clear which one is the con¬ 
comitant reference hierarchy. Notice however, this last 
possibility is not necessarily impossible for all empirical 
networks. Many empirical networks have associated a hi¬ 
erarchical decomposition that can be used as a “ground- 
truth” about its hierarchical structure. Let us remark 
here that by ground-truth we refer to the practical use of 
the term uni- For example, the NAICS mi codes for the 
case of financial networks [101 l42ll44j . and the Harmo¬ 
nized System |45j for the case of the international trade 
network |151148| . However, these studies are left open for 
future research and, in what follows, only consistencies 
and similarities are analyzed in different empirical net¬ 
works. 

The networks in Table |T] (referenced therein) are the 
ones studied in the following analysis. All of these net¬ 
works have convenient characteristics: they are large 
enough to show relatively rich hierarchical community 


structures (e.g. Infomap returns up to five hierarchi¬ 
cal levels for the case of the Power-grid i), with di¬ 
verse shape (e.g. compare the case of the Power-grid in 
Fig.[^, with the case of the Network-science in Fig.[l0^), 
and small enough to keep the computation time bounded. 
Originally, some of these networks had link weights, or 
self-loops. For the sake of simplicity, such attributes are 
removed from the networks. As an illustration of how dif¬ 
ferent are the hierarchical community structures identi¬ 
fied by the different community detection methods, Fig.[^ 
shows the results for the Power-grid network. In this fig¬ 
ure, it is apparent that the different methods provide 
substantially different results. 

In order to enrich the analysis, the topology of the em¬ 
pirical networks is shuffled, following the same procedure 
applied to the Sierpinski networks (cf. Section HI B 21. 
In this way, the obtained hierarchies are analyzed as a 
function of the fraction / of randomized links. 

Firstly, Infomap is used to study the consistency of the 
empirical networks. The results are shown in Fig. [9}i. For 
some networks, like the Power-grid and the Erdos net¬ 
works, the identified hierarchies are largely affected by 
the randomization procedure, i.e. the consistency quickly 
decays with /. This is particularly reasonable for the 
Power-grid network, as its hierarchy is embedded into 
space; reshuffling the links attenuates the embedding, 
rapidly destroying its spatial nature [49ll5T] . There exist 
other networks like EVA, Geometry and Network-science, 
for which the consistencies of the identified hierarchies 
seem quite robust to the randomization procedure. This 
can be interpreted in two ways: on the one hand, this sug¬ 
gests that the hierarchical community structure is mainly 
determined by the node degrees in the networks or some 
other topological property that is not destroyed by the 
randomization procedure. On the other other hand, it 
may indicate that the the relatively large values of con¬ 
sistency are not significant from a hierarchical point of 
view. A closer inspection to the Network-science network 
reveals that the latter possibility is the cause. Specifi¬ 
cally, just a relatively small fraction of the network has 
a rich hierarchical structure with up to 4 levels. The 
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a) Infomap vs HSBM 


b) Infomap vs RL 




c) HSBM vs RL 


HPM, case 1 
case 2 
Sierp., L = 4 
L = 5 
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FIG. 7. (Color Online). The similarity compares how similar are the hierarchies obtained by two different community 
detection methods methods. Here, we compare: a) Infomap vs HSBM, b) Infomap vs RL, and c) HSBM vs RL. The hierarchical 
community structures used to compute the similarities are the same as those used in Fig.[^ Results are shown as a function of 
the parameters /r and / of the corresponding network models. In all cases, the bars represent standard-deviations around the 
mean. 


rest of the network nodes are identified as communities 
at depth 1 = 1, which have no children sub-communities 
(see Fig. 10 1 ). The hierarchical part is washed out as 
/ grows, eventually leading to a star-like structure (see 
Figs. 10 3, 10: and 10 i). The relatively large consistency 


values for large / are the outcome of random coincidences 
occurring for these star-like structures. 

Secondly, the HSBM method is used for the analysis 
and the results are shown in Fig. |§ 3 . Overall, a small con¬ 
sistency is obtained. This is because the HSBM method 
often finds a single community, except for the Geome¬ 
try network. This suggests that the HSBM finds a rich 
hierarchical structure for the Geometry network in the 
form of nested block models. However, a closer inspec¬ 
tion indicates that the HSBM identifies simply two large 
communities, i.e. there is no hierarchy, similar to what 
is found for the Network-science network for the case of 
Infomap. This explains the slow decay of the consistency 
curve. 

Thirdly, the consistency is studied using the RL 
method. The results are shown in Fig. H- In all cases, 
the consistency presents a smooth decay as a function 
of the randomization /. This is not a surprise because 
RL tends to return trees with a large number of sub¬ 
communities and levels. Therefore, the small changes 
occurring for increasing / lead to small changes in the 
consistencies. 

In Figs. [n}r and [U, the average similarity between 
the HSBM method and the other two methods is shown 
as a function of /. Not surprisingly, the values obtained 
are small. However, it is interesting to note that, in cer¬ 
tain cases, the similarity is larger than the corresponding 
values for the consistency. For example, cf. the Network- 
science network in Fig. Ep and Fig. pT^ , for small values 
of /. Even though at a first glance this may seem contra¬ 
dictory, the explanation is simple. The HSBM tends to 
return trivial hierarchies, yielding a value of zero for the 
hierarchical mutual information. Then, when the con¬ 
sistency is computed, the number of terms contribut¬ 
ing with zero to the average {i{T',T')) is proportional 
to 1 — , where p is the probability for the HSBM to 


TABLE 1. Information summary about the empirical network 
datasets used in the calculations. N is number of nodes and 
M number of links. Erdos, Network-science and Geometry are 
scientific-collaboration networks. The Power-grid is techno¬ 
logical, and EVA is a network of corporate inter-relationships. 
The networks marked with * were originally weighted. 


Network 

N 

M 

Ref. 

Power-grid 

4,941 

6,594 

m\ 

Erdos 

6,927 

11,850 

[Sam] 

Network-science* 

1,589 

2,742 

[miM] 

Geometry* 

7,343 

11,898 

[51ES] 

EVA 

8,497 

7,970 

[51155] 


produce a non-trivial hierarchy. On the other hand, such 
probability is 1 — p for the case of the similarity because 
neither Infomap and nor RL produce trivial hierarchies. 
In other words, the chances for zero terms to occur in the 
case of the consistency is significantly larger than for the 
case of the similarity. 

In Fig. the similarity compares the results for 
Infomap and RL. A sharp peak can be appreciated at 
/ « 0.05. This is because Infomap returns a sudden 
change over the number of identified hierarchies. Namely, 
the hierarchies pass from having « 4 communities at level 
Z = 1, to up to « 40. This large number of communities 
at level 1 = 1 is always present for the RL. Therefore, the 
sharp increase occurs when the number of communities 
at level I = 1 becomes large for Infomap, i.e. when it 
becomes similar for both methods. 


1. Temporal Networks 

In this section, the subject of study is slightly modified. 
Specifically, the study of traditional complex networks is 
replaced by the study of correlation matrices computed 
from the log-returns of stock prices in the S&P500 [HI 
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FIG. 8. (Color Online). Hierarchical partition samples, or trees T, computed from the Power-grid empirical network using: 
a) Infomap (left) , b) the HSBM method (middle) and c) RL (right). The trees contain 1099, 40 and 2879 sub-communities, 
respectively. The size of the sub-communities are proportional to the number of network nodes they contain. The spring-layout 
is used to distribute the nodes on the plot [57]. Clearly, the different community detection methods find significantly different 
hierarchies. 
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FIG. 9. (Color Online). The consistency is plotted for the different empirical networks in Table|T] as a function of the fraction 
of randomly rewired links, /, and for the different community detection methods: a) Infomap, b) HSBM, and c) RL. In all 
cases, the bars represent standard-deviations around the mean. 


|43l|58]. The data is obtained from Yahoo! Finance |59|. 
In general, the correlation matrices can be considered as 
weighted dense networks. 

Complex networks are not necessarily static, but 
change in time |60j . The temporal aspect of a complex 
network could have dramatic consequences for the be¬ 
havior of the associated system |5TH55] . The correlation 
matrices of the S&P500 - and the associated hierarchi¬ 
cal community structures - can be studied in their time 
evolution [iniisliss]. Therefore, we use the hierarchical 
mutual information to investigate the evolution of the hi¬ 
erarchical community structure of the financial activity 
in the S&P500. 

The data encompasses the 390 stocks which uninter¬ 
ruptedly cover the 3522 working days from January 1®*, 
1998 until December 31®*, 2011, according to Yahoo! Fi¬ 
nance. Each matrix entry of the correlation matrices is 
given by 

, Cov(.Y..A-..) 

v'Var(A-,)Var(.Y..) 

Specihcally, the r.h.s. of Eq.j^is the cross-correlation be¬ 
tween the time series Xs{t) and Xs'{t), corresponding to 
the stocks s and s', respectively. In general, cross corre¬ 
lation matrices have off-diagonal entries in [—1,1], while 


diagonal entries are equal to one. To simplify the analy¬ 
sis, the correlation matrices are transformed according to 
the expression Wss> = l^ss'I ~ ^ss'- The transforma¬ 
tion returns a weighted network of non-negative entries 
and zero diagonals. The transformed networks are the 
ones used for the computation of the hierarchies. Even 
though more sophisticated approaches exist (see for ex¬ 
ample Ref. CO]), for the sake on simplicity the approach 
taken is the one described above. 


To perform a temporal analysis, different correlation 
matrices, or weighted networks, are computed by pro¬ 
cessing the data over different time windows [t,t -I- T], 
where t is the initial day of the time window, and T the 
window duration, measured in days. 

In the following analysis, only the RL community de¬ 
tection method is used, this is because the other two 
methods typically return trivial communities. More 
specifically, the other two methods fail to find commu¬ 
nities because the correlation networks are dense [66] . 
On the other hand, RL is more sensitive to small link- 
weights differences, and therefore, it is able to find com¬ 
munities in the dense matrices, but at risk of over-fitting 
(see section IIIB3). As it was already mentioned, more 
sophisticate methods can be us ed to m itigate these un¬ 
desired tendencies (see section IIIBl). However, such 
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FIG. 10. (Color Online). Hierarchical partition samples T, computed from the Network-science empirical network using 
Infomap. Each panel correspond to a different level of link randomization: a) / = 0, b) / = 0.2, c) / = 0.5 and d) / = 1. 
In Network-science network, the hierarchy is dominated by branches with almost no children at / = 0, but two branches have 
considerable size and depth. Then, as / grows, the hierarchy evolves towards a simple star, as shown in d). 
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FIG. 11. (Color Online). The similarity is plotted for different empirical networks as a function of the fraction of randomly 
rewired links /, and the different pairs of community detection methods: a) Infomap vs HSBM, b) Infomap vs HSBM, and c) 
HSBM vs RL. In all cases, the bars represent standard-deviations around the mean. 


experiments are left for future works. 

Two sets of experiments are analyzed; in both cases, for 
each computed weighted network, 50 hierarchical com¬ 
munity structures, or trees, are computed. In the first set 
of experiments, we analyze how the integration time, or 
time windows length T, affects the detected hierarchies. 
For this purpose, we compute the following average nor¬ 
malized hierarchical mutual information, 

(*t) := • 


We call this quantity, the temporal-scale hierarchical sim¬ 
ilarity, or simply, the scale-similarity. It compares hierar¬ 
chies obtained from the full-length time window, against 
hierarchies obtained from time windows of length T. In 
all cases, the initial time is chosen to be the first day, 
t = 0. In Fig. 12 1 , {i-r) is plotted as a function of T. As 
it can be seen, the larger is T, the larger is (*t)- In other 
words, the expected behavior is observed because, the 
larger is T the more similar 7 t and Tr^ax become in aver¬ 
age. In particular, a plateau exists for 1000 '^T < 3000. 
This last observation suggests that changes do not occur 
smoothly, but different hierarchical structural properties 
emerge at different time scales. 

In the second set of experiments, T is fixed at 1500 days 
and trees are computed out of networks corresponding to 
different regions in the time line. More specifically, we 
introduce the temporal hierarchical auto-similarity - or 


auto-similarity - which is defined as 

(*t.r) := {i{Tt-,Tt+r)) ■ 

The auto-similarity compares two set of hierarchies. The 
first set is computed from the data in the time window 
[t,t-\-T], and the other set from the time window defined 
T days after. We analyze the auto-similarity varying r 
for fixed t = 1, and varying t for fixed r = 100. In the 
first case, we study how the time separation r affects 
the hierarchy, and in the second case we compare hierar¬ 
chies corresponding to consecutive time windows as time 
evolves. In Fig. both quantities are plotted. On the 
one hand, the auto-similarity (z(=i,r) decays as the time 
separation r grows (circles), i.e. the hierarchy drift away 
from the initial structure. On the other hand, the auto¬ 
similarity fluctuates around (ft,r=ioo) ~ 0.7 (triangles), 
indicating that the hierarchies of consecutive time win¬ 
dows always share a significant amount of information. 


IV. DISCUSSION, CONCLUSIONS AND 
FUTURE WORK 

In this work, the hierarchical mutual information has 
been introduced, a tool that generalizes the standard mu¬ 
tual information for the comparison of hierarchies. More 
specifically, for the comparison of hierarchical partitions, 
which take the form of trees where parts are subsequently 
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FIG. 12. (Color Online). Temporal analysis of the hierarchi¬ 
cal commnnity structnre of correlation matrices. The matri¬ 
ces are computed from log-retnrns of the time series of stock 
prices in the S&P500. In a), the scale-similarity {ir), deter¬ 
mines how the hierarchies change with the time length T of 
the time window over which the data is processed. In b), two 
different comparisons are presented nsing the auto-similarity 
(it.r). With circles, determines how similar are the 

hierarchies at day one, with the hierarchies r days after. With 
triangles, (it,T=ioo) determines how similar are the hierarchies 
of consecutive time windows, separated by 100 days, as time 
t evolves. In all cases, the bars represent standard-deviations 
around the mean. 


subdivided further into sub-parts and so on. The hier¬ 
archical mutual information can be used to compare the 
hierarchical community structure of complex networks, in 
analogous way as the standard mutual information can 
be used to compare standard community structures. 

We define here a normalized hierarchical mutual infor¬ 
mation. The traditional normalized mutual information 
satisfy certain properties; it is a quantity lying in [0,1], 
and is equal to one if and only if the compared parti¬ 
tions are exactly equal. If the normalized hierarchical 
mutual information behaves correctly, it should satisfy 
analogous properties. The appropriate behavior of the 
normalized hierarchical mutual information is extensively 
tested in numerical experiments. The test include arti¬ 
ficially generated hierarchical partitions, and the hierar¬ 
chical community structure of artificially and empirical 
complex networks. In all the experiments, the normal¬ 


ized hierarchical mutual information is found to behave 
correctly. However, it should be mentioned that a for¬ 
mal proof of the correct behavior is not provided in the 
present work. 

The experiments also illustrate the overall behavior of 
the hierarchical mutual information. On the one hand, 
when comparing artificially generated hierarchies against 
correspondingly randomized ones, the normalized hierar¬ 
chical mutual information was found to decrease with 
the level of randomization. On the other hand, a level 
by level randomization analysis of the hierarchies indi¬ 
cated that, the larger the number of randomized levels, 
the faster the normalized hierarchical mutual information 
decays with the randomization. Another interesting find¬ 
ing was that the normalized hierarchical mutual informa¬ 
tion never decays to zero. This effect, also present in the 
standard normalized mutual information, occurs because 
random (hierarchical) partitions in finite systems share 
information just by chance. 

The experiments also constitute examples of how the 
hierarchical mutual information can be used to analyze 
the hierarchical community structure of complex net¬ 
works. Specihcally, the hierarchical community structure 
of artihcial and empirical networks were studied. In the 
analysis, different popular community detection meth¬ 
ods were utilized, and the results compared. The results 
were tested on two network models and five empirical 
networks. It was found that the different methods can re¬ 
turn significantly different hierarchical community struc¬ 
tures. The normalized hierarchical mutual information 
correctly identifies these differences. It was also shown 
that the normalized hierarchical mutual information can 
be used to compare the detected hierarchies against the 
natural, reference ones in the different network models. 
In particular, when the parameters of the network models 
are appropriate, and the network models tend to gener¬ 
ate networks with the expected hierarchical structures, 
the normalized mutual information between the identi¬ 
fied hierarchies and the expected ones tends to grow. 

In another set of experiments, the normalized hierar¬ 
chical mutual information was used to compare the hi¬ 
erarchical community structure of the different networks 
- the networks generated by the models, and the em¬ 
pirical networks - against that of correspondingly ran¬ 
domized networks. As expected, the normalized mutual 
information was found to decay with the level of ran¬ 
domization. In a final example, the time evolution of the 
hierarchical community structure of correlation matrices 
was analyzed. Specifically, we considered correlation ma¬ 
trices computed from the log returns of stock prices in the 
S&P500. This final example epitomizes how the hierar¬ 
chical mutual information is useful to study the evolution 
of temporal networks. In the analysis, the normalized 
hierarchical mutual information showed that the hierar¬ 
chical community structure of the correlations of stocks 
slowly changes in time, but exhibiting important changes 
at different times-scales. 

The present work opens several possibilities for future 
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research. The mathematical framework behind the hi¬ 
erarchical mutual information can be used to generalize 
other information measures, like generalizing the varia¬ 
tion of information m- On a different line of research, 
the normalized hierarchical mutual information can be 
used to systematically benchmark, and compare, the dif¬ 
ferent community detection methods in existence. An¬ 
other interesting future line of research concerns the com¬ 
parison of phylogenetic trees [5S1 EOl EZl El] , where the 
hierarchical mutual information could have useful appli¬ 
cations. Finally, the normalized hierarchical mutual in¬ 
formation can be used to compare the identified hierar¬ 
chies against corresponding ground-truth hierarchies that 
different data sets might have available. The above ex¬ 


amples go without mentioning the ample possibilities of 
using and extending this methodology in the many fields 
where hierarchical communities structures are identified. 


V. ACKNOWLEDGMENTS 

J.I.P and G.C. acknowledge support from: FET IP 
Project MULTIPLEX nr. 317532. EET Project SIM- 
POL nr. 610704, FET project DOLEINS nr. 640772. 
CJT acknowledges financial support from the URPP on 
Social Networks, Universitat Zurich, Switzerland. We 
also acknowledge useful comments by A. Clauset, M. Ros- 
vall, T. Peixoto and F. Queyroi. 


[1] H. A. Simon, Proc. Am. Phil. Soc. 106, 467 (1962). 

[2] P. W. Anderson et al., Science 177, 393 (1972). 

[3] J. H. Holland, Emergence: from chaos to order (Oxford 
University Press, 1998). 

[4] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and 
D.-U. Hwang, Phys. Rep. 424, 175 (2006). 

[5] G. Caldarelli, Scale-Free Networks: complex webs in na¬ 
ture and technology (Oxford University Press, 2007). 

[6] M. Newman, Networks: An Introduction (Oxford Uni¬ 
versity Press, 2010). 

[7] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson, 
Y. Moreno, and M. A. Porter, J. Complex Netw. 2, 203 
(2014). 

[8] M. Rosvall and C. T. Bergstrom, PloS ONE 6, el8209 

( 2011 ). 

[9] T. P. Peixoto, Phys. Rev. X 4, 011047 (2014). 

[10] M. MacMahon and D. Garlaschelli, Phys. Rev. X 5, 
021006 (2015). 

[11] S. Fortunato, Phys. Rep. 486, 75 (2010). 

[12] M. Girvan and M. E. Newman, Proc. Natl. Acad. Sci. 
USA 99, 7821 (2002). 

[13] M. E. J. Newman, Phys. Rev. E 74, 036104 (2006). 

[14] T. Heimo, J. M. Kumpula, K. Kaski, and J. Saramaki, 
J. Stat. Mech. Theor. Exp. 2008, P08007 (2008). 

[15] A. Lancichinetti and S. Fortunato, Phys. Rev. E 80, 
056117 (2009). 

[16] V. Zlatic, A. Gabrielli, and G. Galdarelli, Phys. Rev. E 
82, 066109 (2010). 

[17] P. Zhang and C. Moore, Proc. Natl. Acad. Sci. USA 111, 
18144 (2014). 

[18] M. Sales-Pardo, R. Guimera, A. A. Moreira, and L. A. N. 
Amaral, Proc. Natl. Acad. Sci. USA 104, 15224 (2007). 

[19] A. Glauset, C. Moore, and M. E. Newman, Nature 453, 
98 (2008). 

[20] A. Arenas, A. Fernandez, and S. Gomez, New J. Phys. 
10, 053039 (2008). 

[21] A. Lancichinetti, S. Fortunato, and J. Kertesz, New J. 
Phys. 11, 033015 (2009). 

[22] A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. For¬ 
tunato, PloS ONE 6, el8961 (2011). 

[23] C. Granell, S. Gomez, and A. Arenas, Int. J. Bifurcat. 
Chaos 22, 1250171 (2012). 

[24] C. Granell, S. Gomez, and A. Arenas, Int. J. Bifurcat. 
Chaos 22 , 1230023 (2012). 


[25] F. Queyroi, M. Delest, J.-M. Fedou, and G. Melangon, 
Data. Min. Knowl. Discov. 28, 1107 (2014) 

[26] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, J. 
Stat. Mech. Theor. Exp. 2005, P09008 (2005). 

[27] M. Meila, J. Multivar. Anal. 98, 873 (2007). 

[28] B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, and 
L. Zhang, in Proceedings of the eighth annual ACM-SIAM 
symposium on Discrete algorithms (Society for Industrial 
and Applied Mathematics, 1997) pp. 427-436. 

[29] J. Nielsen, A. K. Kristensen, T. Mailund, and G. N. 
Pedersen, Algorithm. Mol. Biol. 6, 15 (2011). 

[30] F. Shi, Q. Feng, J.-J. Chen, L. Wang, and J. Wang, 
Tsinghua Sci. Technol. 18, 490 (2013). 

[31] F. Queyroi and S. Kirgizov, Inform. Process. Lett. 115, 
689 (2015). 

[32] T. M. Cover and J. A. Thomas, Elements of information 
theory (Wiley-Interscience, Hoboken, N.J, 2006). 

[33] https://www.python.org/ 

[34] http: //hierpart. readthedocs . org, 

[35] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and 
E. Lefebvre, J. Stat. Mech. Theor. Exp. 2008, P10008 
(2008). 

[36] M. E. Newman, Proc. Natl. Acad. Sci. 103, 8577 (2006). 

[37] P. Pons and M. Latapy, Theor. Comput. Sci. 412, 892 

( 2011 ). 

[38] A. Gondon and R. M. Karp, Random Struct. Algor. 18, 
116 (2001). 

[39] S. Maslov and K. Sneppen, Science 296, 910 (2002). 

[40] D. Hric, R. K. Darst, and S. Fortunato, Phys. Rev. E 
90, 062805 (2014). 

[41] http: //www. census .gov/eos/www/naics/ 

[42] R. N. Mantegna, Eur. Phys. J. B 11, 193 (1999). 

[43] G. Bonanno, G. Galdarelli, F. Lillo, and R. N. Mantegna, 
Phys. Rev. E 68, 046130 (2003). 

[44] G. Bonanno, G. Galdarelli, F. Lillo, S. Micciche, N. Van- 
dewalle, and R. N. Mantegna, Eur. Phys. J. B , 363 
(2004). 

[45] http://www.wcoomd.org 

[46] G. A. Hidalgo, B. Klinger, A.-L. Barabasi, and R. Haus- 
mann. Science 317, 482 (2007). 

[47] M. Barigozzi, G. Fagiolo, and D. Garlaschelli, Phys. Rev. 
E 81, 046104 (2010). 

[48] G. Caldarelli, M. Cristelli, A. Gabrielli, L. Pietronero, 
A. Scala, and A. Tacchella, PloS ONE 7, e47278 (2012). 



15 


[49] H. D. Rozenfeld, C. Song, and H. A. Makse, Phys. Rev. 
Lett. 104, 025701 (2010). 

[50] M. Barthelemy, Phys. Rep. 499, 1 (2011). 

[51] M. Popovic, H. Stefancic, and V. Zlatic, Phys. Rev. Lett. 
109, 208701 (2012). 

[52] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998). 

[53] J. W. Grossman, Erdos Number Project (2002). 

[54] http: //vlado. fmf .uni-lj . si/pub/networks/data/ 

[55] http: //jef f e . cs . Illinois . edu/compgeom/biblios . 
html 

[56] K. Norlen, G. Lucas, M. Gebbie, and J. Chuang, in Proc. 
Inter. Telec. Soc. (2002). 

[57] https : //networkx. github. io/ 

[58] M. Tumminello, F. Lillo, and R. N. Mantegna, J Econ. 
Behav. Organ. 75, 40 (2010). 

[59] nttp://finance. yahoo . com 


[60] P. Holme and J. Saramaki, Phys. Rep. 519, 97 (2012). 

[61] M. Starnini, A. Baronchelli, A. Barrat, and R. Pastor- 
Satorras, Phys. Rev. E 85, 056115 (2012). 

[62] R. Pfitzner, 1. Scholtes, A. Garas, C. J. Tessone, and 
F. Schweitzer, Phys. Rev. Lett. 110, 198701 (2013). 

[63] 1. Scholtes, N. Wider, R. Pfitzner, A. Garas, C. J. Tes¬ 
sone, and F. Schweitzer, Nat. Gommun. 5 (2014). 

[64] M. Bazzi, M. A. Porter, S. Williams, M. McDonald, D. J. 
Fenn, and S. D. Howison, arXiv:1501.00040 (2014). 

[65] C. Granell, R. K. Darst, A. Arenas, S. Fortunate, and 
S. Gomez, Phys. Rev. E 92, 012805 (2015). 

[66] T. Kawamoto and M. Rosvall, Phys. Rev. E 91, 012809 
(2015). 

[67] D. Robinson and L. R. Foulds, Math. Biosci. 53, 131 
(1981). 

[68] L. van lersel, S. Kelk, N. Lekic, and L. Stougie, SIAM 
J. Discrete Math. 28, 49 (2014). 



